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Abstract 

Model-based prognostics captures system knowl- 
edge in the form of physics-based models of com- 
ponents, and how they fail, in order to obtain ac- 
curate predictions of end of life (EOL). EOL is 
predicted based on the estimated current state dis- 
tribution of a component and expected profiles 
of future usage. In general, this requires sim- 
ulations of the component using the underlying 
models. In this paper, we develop a simulation- 
based prediction methodology that achieves com- 
putational efficiency by performing only the min- 
imal number of simulations needed in order to 
accurately approximate the mean and variance of 
the complete EOL distribution. This is performed 
through the use of the unscented transform, which 
predicts the means and covariances of a distribu- 
tion passed through a nonlinear transformation. 

In this case, the EOL simulation acts as that non- 
linear transformation. In this paper, we review the 
unscented transform, and describe how this con- 
cept is applied to efficient EOL prediction. As a 
case study, we develop a physics-based model of 
a solenoid valve, and perform simulation experi- 
ments to demonstrate improved computational ef- 
ficiency without sacrificing prediction accuracy. 

1 INTRODUCTION 

Prognostics is an essential technology for improving 
system safety, reliability, and availability. Prognos- 
tics deals with determining the health state of compo- 
nents, and projecting the evolution of the health into 
the future to make end of life (EOL) and remaining 
useful life (RUL) predictions. Model-based prognos- 
tics approaches perform these tasks with the aid of a 
model that captures knowledge about the system, its 
components, and their failures, typically in the form of 

This is an open-access article distributed under the terms 
of the Creative Commons Attribution 3.0 United States Li- 
cense, which permits unrestricted use, distribution, and re- 
production in any medium, provided the original author and 
source are credited. 


a physics-based model that is derived from first prin- 
ciples (Roemer et al., 2005; Byington et al., 2004; 
Saha and Goebel, 2009; Daigle and Goebel, 2010). 

The expression of confidence in a prediction pro- 
vides important information to a decision maker. It 
is therefore critical to properly represent and process 
various sources of uncertainty. EOL and RUL can 
then be, for example, embodied as probability distri- 
butions. These distributions are often dominated by 
the uncertainty of future usage. For the system con- 
sidered here, we assume a single trajectory of future 
usage, which, for a given fault mode, makes the distri- 
bution unimodal (but not necessarily Gaussian). In this 
case, the means and variances of these distributions 
are the most important and useful pieces of informa- 
tion, as they provide information on both the accuracy 
and spread of the prediction. Often, the EOL distribu- 
tion is obtained starting with a distribution describing 
the current state of the system, and propagating that 
distribution forward to EOL. If the representation of 
the distribution is sample-based, as with particle fil- 
ters, then this is straightforward, otherwise, in general, 
a sample-based representation is needed, as often an 
analytical solution is unavailable or intractable. Pre- 
diction is then performed by simulating each sample 
forward to EOL. However, this task can be computa- 
tionally prohibitive due to the large number of samples 
often needed to accurately represent the state distribu- 
tion. 

In this paper, we develop a novel method to increase 
the efficiency of the prediction step. We do this using 
the unscented transform (Julier and Uhlmann, 1997), 
which is a method to predict the mean and covariance 
of a distribution that undergoes a nonlinear transforma- 
tion. In this case, the nonlinear transformation is the 
simulation to EOL. The unscented transform approxi- 
mates the given distribution with deterministically se- 
lected samples, which are then transformed, and the 
mean and covariance of the EOL distribution may be 
computed from the transformed samples. Effectively, 
only the minimal amount of simulations are being per- 
formed, and the samples are chosen in such a way that 
the predicted mean and covariance closely approxi- 
mate the mean and covariance obtained by transform- 



Annual Conference of the Prognostics and Health Management Society, 2010 



Figure 1: Prognostics architecture. 


ing the entire distribution, thus achieving the same re- 
sult at a fraction of the computational cost, both in 
time and memory. Since prediction is the main goal of 
prognostics, computationally efficient prediction is of 
utmost importance. Efficient prediction methods take 
less time, so, therefore, more predictions can be made 
at a faster rate. 

We review the common forms of the unscented 
transform, and develop the new prediction method- 
ology as part of our model-based prognostics frame- 
work (Daigle and Goebel, 2009; 2010). As a case 
study, we construct a detailed physics-based model 
of a solenoid valve that includes models of different 
damage mechanisms and their progression. Solenoid 
valves have application in many domains, and reliable 
performance of these valves is crucial to many com- 
plex systems (Tansel et ai, 2005). We run a set of 
simulation-based prognostics experiments, using the 
solenoid valve model, to demonstrate the application 
of the new prediction methodology and compare it to 
the baseline approach. 

The paper is organized as follows. Section 2 de- 
scribes the prognostics approach. Section 3 presents 
the modeling methodology and develops the model of 
the solenoid valve. Section 4 discusses the damage es- 
timation approach. Section 5 overviews the unscented 
transform and develops the new prediction procedure. 
Section 6 presents comprehensive simulation experi- 
ments applying the framework to the solenoid valve 
case study. Section 7 concludes the paper. 

2 PROGNOSTICS APPROACH 

The problem of prognostics is to predict the EOL 
and/or the RUL of a component. In this section, we 
first formally define the problem of model-based prog- 
nostics. We then describe a general model-based ar- 
chitecture within which a prognostics solution may be 
implemented. 

2.1 Problem Formulation 

In a general model-based prognostics approach, the 
system model may be given by 

x(f) = f(t,x(f),0(f),u(f),v(f)) 

y (£) = h(t,x(t),0(t),u(t),n(t)), 

where x(f) £ R Hx is the state vector, 0(f) € R™ 9 is 
the parameter vector, u(f) £ R n “ is the input vector, 
v(f) £ R" x is the process noise vector, f is the state 
equation, y(f) £ R"» is the output vector, n(f) £ R n " 
is the measurement noise vector, and h is the output 
equation. The parameters 0(f) evolve by some un- 
known process, but, in practice, are typically consid- 
ered to be constant. 

Our goal is to predict EOL at a given time point f /> 
using the discrete sequence of observations up to time 


tp, denoted as yo-.t P - EOL is defined as the time point 
at which the component no longer meets a functional 
requirement (e.g., a valve does not open in the required 
amount of time). This point is often linked to a damage 
threshold, beyond which the component fails to func- 
tion properly. In general, we may express this thresh- 
old as a function of the system state and parameters, 
T’eol( x (£)^ 0(f)), which determines whether EOL has 
been reached, where 

rp t (+\ auw / 1) if EOL is reached 
Wx(*),0(*)) = | O) otherwise. 

Using this function, we can formally define EOL with 

EOL[tp) = axg min T BO l(x(£), 0(f)) = 1, 

t p 

and RUL with 

RUL(t P ) =EOL(tp)-t P . 

Due to the many sources of uncertainty that ex- 
ist in the prediction problem, it is much more useful 
to compute a probability distribution of the EOL or 
RUL, rather than a single prediction point. The goal, 
then, is to compute, at time tp, p(EOL(t p )\yo-.t P ) or 
p(RUL(tp)\yo:t P ). 

2.2 Prognostics Architecture 

We adopt a model-based approach, wherein we de- 
velop detailed physics-based models of components 
and systems that include descriptions of how fault pa- 
rameters evolve in time. These models depend on 
unknown and possibly time-varying wear parameters, 
0(f). Therefore, our solution to the prognostics prob- 
lem takes the perspective of joint state-parameter es- 
timation. In discrete time k, we estimate x k and 0 k , 
and use these estimates to predict EOL and RUL at 
desired time points. Using p{x kp ,O kp |y 0: fc P ) at pre- 
diction time kp, we compute p(EOL kp |yo:fcj= ) and 
p(RUL kp |yo: fcp ). 

We employ the prognostics architecture in Fig. 1 
(Daigle and Goebel, 2010). The system is provided 
with inputs u/ ;: and provides measured outputs y k . 
The fault detection, isolation, and identification (FDII) 
module determines a fault set F, which is used by the 
damage estimation module to determine estimates of 
the states and unknown parameters, represented as a 
probability distribution p(x k , 6 k |yo : fc)- The prediction 
module uses this distribution, along with hypothesized 
future inputs, to compute EOL and RUL as probability 
distributions p(EOL kp \y 0:kp ) and p(RUL kp \y 0:kp ). 
In this paper, we focus on the damage estimation and 
prediction modules, and assume a solution to EDIT 
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Armature Cylinder Port 



Figure 2: Three-way two-position solenoid valve. 


3 SOLENOID VALVE MODELING 

We apply our prognostics approach to a solenoid valve, 
and develop a physics-based model of its nominal and 
faulty behavior. A typical three-way, two-position 
solenoid valve for controlling gas flow is shown in 
Fig. 2. The valve is held in its de-energized position 
by the return spring, as shown in the figure. In this 
position, gas is allowed to pass between the normally 
open port and the cylinder port. To energize the valve, 
a voltage is applied to the solenoid, which produces 
an electromagnetic force that moves the valve stem to- 
wards its energized position until it contacts the seat. 
In this position, gas is allowed to pass between the nor- 
mally closed port and the cylinder port. We refer to the 
de-energized position as the closed position, and the 
energized position as the open position. 

The state x of the solenoid valve is given by 


forces from the gas flowing through the valve, how- 
ever, here, we assume a balanced design in which the 
pressure forces always cancel. 

The solenoid force is given by 




where L(x) is the inductance of the solenoid (Ly- 
shevski et al., 1999; Rahman et al., 1996). The force 
acts to decrease the reluctance of the magnetic circuit 
by decreasing the air gap, which is a function of x, 
thus acting to open the valve. The solenoid current is 
described by 


di(t ) 
dt 


zfe) (“( 


where u(t) is the applied voltage, and R is the coil re- 
sistance (Lyshevski et al., 1999; Rahman et al., 1996; 
Szente and Vad, 2001). The voltage u(t) is the only 
external input considered here, i.e., 

u(i) = [u(t)] . 

The inductance of a solenoid is given by 


L(x) 


N 2 

n{xy 


where N is the number of wire turns in the coil, and 
7 Z is the reluctance of the magnetic circuit. In general, 
reluctance is given by 



x(t) 


"*(*)' 
v(t) , 
AY. 


where x(t) is the valve position, v(t) is the valve ve- 
locity, and i(t) is the solenoid current. We define 
x = 0 as the position of the valve when in the closed 
(de-energized) position, and x = L s as the position of 
the valve when in the open (energized) position, where 
L s is the length of the valve stroke. 

The position derivative is given by v(t), and the ve- 
locity derivative is determined from the forces acting 
on the stem: 

- ' ,V j^ = — ( F e {t ) - k(x(t) - x 0 ) - rv(t) - F c (t )) , 
dt m 


where l is the length of the magnetic circuit, A is the 
cross-sectional area of the circuit, and g is the mag- 
netic permeability of the material. If we define the 
maximum air gap as go, then the actual air gap is given 
by go — x. The reluctance depends on the geometry 
of the solenoid. We may assume a typical geometry in 
which reluctance is described by 

'd t \ , 9o ~ x 

Fix) = — H — , 

ti c A c (i 0 A g 

where the c subscript denotes lumped parameters for 
the core and armature, fi o is the permeability of air, 
and A g is the effective cross-sectional area of the air 
gap (Lyshevski et al., 1999). Therefore, the inductance 
is given by 


where F e (t) is the electromagnetic force, k is the re- 
turn spring constant and x Q is the amount of spring 
compression when the valve is in the closed position 
(where we lump the armature and return spring into a 
single spring), r is the kinetic friction coefficient, and 
F c (t) is the contact force with the seat, which may be 
described by 

! k c (—x), if x < 0, 

0, if 0 < x < L s , 

k c {x L s ), if x L s , 

where k c is the (large) spring constant associated with 
the flexible seats. In general, we may also consider 


L(x) = 


N 2 fj, 0 A g n c A c 


Po Agl c T lx c A c (go X s ) 

and its derivative with respect to x is 

dL( x) _ N 2 noA g A c A 2 


\2 ■ 


dx ( noAgl c + n c A c (g 0 - x)Y 

We select our complete measurement vector as 


y (t) = 


x(t) 

i(t) 

open(t) 
closed(t) ] 
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Figure 3: Nominal solenoid valve operation 


The open(t) and closed(t) measurements are discrete 
sensors which output 1 if the valve is in the fully 
opened or fully closed state: 


open(t) 


f 1, if x(t) > L s 
[0, otherwise 


closed(t) = 



if x(t) < 0 

otherwise. 


Fig. 3 shows a nominal valve cycle. The valve is 
commanded to open at 0 s. The current and mag- 
netic field build up in the solenoid, and soon, enough 
force is produced to overcome friction and the return 

/ \ dL(x) , . 

spring. As the valve moves, the i(t) — - — v(t) term 

ox 

begins to dominate, causing the current to decrease. 
When the valve opens against the seat and stops mov- 
ing, the current increases again, resulting in the cusp 
observed in the current just before 0.05 s. The current 
then increases to its steady state, determined by the ap- 
plied voltage and the coil resistance. At 0.25 seconds, 
the valve is commanded to close by removing the ap- 
plied voltage. The current drains out of the solenoid, 
and soon, the electromagnetic force is no longer strong 
enough to keep the valve in place. The valve begins to 

, , . . dL(x) , . 

close, and, as the i(t) — - — v(t) term again comes to 
ox 

dominate, the current increases briefly until the valve 
fully closes and v(t) becomes 0, resulting in another 
cusp. The current then decreases smoothly to 0. 


3.1 Damage Modeling 

In our modeling methodology, the nominal model is 
extended with damage models. These models describe 


how parameters associated with the degree of valve 
damage progress in time, and allow us to make predic- 
tions of damage progression. From valve documenta- 
tion and historical maintenance records, we have iden- 
tified the most relevant faults for prognostics. The set 
of faults includes friction damage, spring damage, and 
the accumulation of debris on the valve seats. 

A common damage mechanism present in valves is 
sliding wear (Daigle and Goebel, 2009). The equation 
for sliding wear takes on the following form: 

V{t) = HF(t)u(i)|, 

where V (t) is the wear volume, w is the wear coef- 
ficient (which depends on material properties such as 
hardness), F(t) is the sliding force, and v(t) is the slid- 
ing velocity (Hutchings, 1992). Friction will increase 
linearly with sliding wear, because the contact area be- 
tween the sliding bodies becomes greater as surface 
asperities wear down (Hutchings, 1992). We charac- 
terize friction damage by a change in the friction co- 
efficient, and model the damage progression in a form 
similar to sliding wear (Daigle and Goebel, 2009): 

r{t) = w r \F f (t)v(t)\ 

where w r is the wear coefficient, and Ff(t) is the fric- 
tion force defined previously. The friction parameter 
only grows when the valve is moving, so, the friction 
parameter evolves in a step-wise fashion, with damage 
only occurring during the valve’s opening and clos- 
ing motions. As the friction parameter increases, the 
friction force increases, further increasing the rate at 
which the friction parameter grows, resulting in a dam- 
age progression similar to an exponential when viewed 
at large time scales. We define r + as the largest value 
of the friction coefficient at which the valve still actu- 
ates in the required time. So, TEOL^it), 0(t)) = 1 if 
r(f) > r + . 

We assume a similar equation form for spring dam- 
age (Daigle and Goebel, 2009): 

k(t) = -w k \F a (t)v(t)\, 

where Wk is the spring wear coefficient and F s (t) is the 
spring force. The more the spring is used, the weaker 
it becomes, characterized by the change in the spring 
constant. As with friction damage, the spring constant 
only decreases when the valve moves. As the spring 
becomes damaged, the spring force will decrease, and 
so the rate at which spring damage occurs will also 
decrease. We define k~ as the smallest value of the 
spring constant at which the valve still closes in the 
required time. So, Teol( x (()j 0(f)) = 1 if k(t) < 
k~ . 

Another failure relates to the accumulation of partic- 
ulate matter and other forms of debris at the seats. As 
debris builds up, it impedes the valve’s travel and pre- 
vents the valve from fully opening or closing, which, in 
turn, causes leaks through the valve. We assume that 
the accumulation of debris is due to sliding wear. It 
results in a change in the boundary conditions of the 
valve motion. We define L c as the boundary when 
the valve is in the closed position (nominally 0, where 
L c > 0), and L s — L 0 as the boundary when in the 
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open position (nominally L s , where L a > 0). We as- 
sume that the rates of change of the offsets L c and 
grow proportionally to sliding wear: 

L c {t) = w c \F f (t)v(t)\ 

L 0 (t) = w 0 \Ff(t)v(t)\. 

We define Lfi and Lfi as the largest allowable values 
of the offsets. So, TEOL(*-(t),0(t)) = 1 if L c (t) > 
L+ or L 0 (t) > L+. 

4 DAMAGE ESTIMATION 

In the model-based paradigm, damage estimation re- 
duces to joint state-parameter estimation, i.e., compu- 
tation of p(x k , #fc|yo:fc)- A general solution to this 
problem is the particle filler , which may be directly 
applied to nonlinear systems with non-Gaussian noise 
terms. Particle filters offer approximate (suboptimal) 
solutions for systems where optimal solutions are un- 
available or intractable (Arulampalam el al., 2002; 
Cappe et al., 2007). In particle filters, the state dis- 
tribution is approximated by a set of discrete weighted 
samples, called particles. As the number of particles is 
increased, performance increases and the optimal solu- 
tion is approached. 

With particle filters, the particle approximation to 
the state distribution is given by 

where N denotes the number of particles, and for par- 
ticle i, xj. denotes the state vector estimate, 0' k de- 
notes the parameter vector estimate, and w\ denotes 
the weight. The posterior density is approximated by 

N 

p(x fc ,0fc|y O: fc) ~ ^w l k 6 ix i k e i k) (dx k dG k ), 

i= 1 

where ^(dx k dO k ) denotes the Dirac delta func- 

tion located at (xj., 0 k ). 

We employ the sampling importance resampling 
(SIR) particle filter, and implement the resampling step 
using systematic resampling (Kitagawa, 1996). The 
pseudocode for a single step of the SIR filter is shown 
as Algorithm 1. Each particle is propagated forward 
to time k by first sampling new parameter values and 
sampling new states. The particle weight is assigned 
using y k . The weights are then normalized, followed 
by the resampling step 1 . 

Here, the parameters 9 k evolve by some unknown 
process that is independent of the state x k . However, 
we need to assign some type of evolution to the pa- 
rameters. The typical solution is to use a random walk, 
i.e., for parameter 9, 9 k = 9 k - 1 + £k- 1 > where £ k -i is 
typically Gaussian noise. With this type of evolution, 
the particles generated with parameter values closest to 
the true values should be assigned higher weight, thus 
allowing the particle filter to converge to the true val- 
ues. The selected variance of the random walk noise 


’Pseudocode for the systematic resampling algorithm is 
provided in (Arulampalam et al., 2002). 


Algorithm 1 SIR Filter 

Inputs: {(x^ 1 , 0 j,_ 1 ),M 4 _ 1 }A 1 , u*_i:fc,y fc 
Outputs: {(■x.l,0 l k ),w' k }iL 1 

for i — 1 to A do 

ffi », ,) 

Xfc ~p(xfc|xJ._ 1 , 0 J._ 1 ,Ufc_l) 
wl p{yk\xl,0 l k ,u k ) 

end for 

N 
i= 1 

for i = 1 to N do 
wl - wi/W 

end for 

{(xL0fc),i4}A i <- Resample({(xl, 01), wlj-Li) 


determines both the rate of this convergence and the 
estimation performance once convergence is achieved. 

Note that in a particle filter, a certain amount of 
sensor noise must be assumed, but, in practice, the 
discrete position sensors ( open and closed ) have no 
noise, therefore, a small amount of noise must be as- 
sumed within the particle filter for those sensors. 

5 PREDICTION 

Prediction is initiated at a given time kp. Us- 
ing the current joint state-parameter estimate, 
p(xfc p ,0fc p |yo : fep), which represents the most up-to- 
date knowledge of the system at time kp, the goal is 
to compute p(EOL kp \y 0:kp ) and p(RUL kp \y 0:kp ). 
As discussed in Section 4, the particle filter computes 

N 

p(xfcp,0fcp|yo : fcp) « ^^p<5 (x . fcj5i0 i p) (dx fe pd0 fe p). 
i = 1 

We can approximate a prediction distribution n steps 
forward as (Doucet et al., 2000) 

Pfa-kp+m @kp-\-n\y 0:kp) ~ 

N 

W kp S (*i p+n ’ 9 l P +J ( d *k P +ndO k P+n)- 

i = 1 

So, for a particle i propagated n steps forward without 
new data, we may take its weight as w kp . Similarly, 
we can approximate the EOL as 

N 

p(EOL kp |y 0 : fcp) « 'Y^w kp 8 EOL i kp {dEOL kp ). 

i—l 

To compute EOL, then, we propagate each particle for- 
ward to its own EOL and use that particle’s weight at 
kp for the weight of its EOL prediction. 

If an analytical solution exists for the prediction, this 
may be directly used to obtain the prediction from the 
state -parameter distribution. An analytical solution is 
rarely available, so the general approach to solving the 
prediction problem is through simulation. Each parti- 
cle is simulated forward to EOL to obtain the complete 
EOL distribution. The pseudocode for the baseline 
prediction procedure is given as Algorithm 2 (Daigle 


5 
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Algorithm 2 EOL Prediction 

Inputs: {{xi p ,0i p ),wi p }f =1 
Outputs: {EOL\ p ,wl p }i =1 

for i = 1 to A do 


k <— kp 



while Teol(K, 6 l k ) = 0 do 

Predict it/, 

oi+i~p(Ok+i\oi) 

Xfe+l ~p(x fe+ l|x* fc ,0 l fc ,Ufc) 
k <— k + 1 

Xfe <— X fe+1 
nt . ni 

“k r'fc+l 

end while 

EOLl p <- k 

end for 


and Goebel, 2010). Each particle i is propagated for- 
ward until Teol(K i 0]|;) evaluates to 1; at this point 
EOL has been reached for this particle. 

Note that, in general, we may sample new parame- 
ter values 6 , however, the noise considered here should 
typically be considerably less than the noise used for 
the random walk during the estimation phase, as we 
usually assume these parameters are either constant or 
only exhibit very small deviations. Note also that pre- 
diction requires hypothesizing future inputs of the sys- 
tem, u/,., because damage progression is rarely inde- 
pendent of the system inputs. For this reason the inputs 
must be chosen carefully. Here, we assume only a sin- 
gle future input trajectory, i.e., u/, is defined uniquely 
for all values of k. This is a practical assumption for 
the solenoid valve, because the valve is always fully 
opened or fully closed, and a single voltage value u(t) 
is consistently applied for opening the valve. Since 
damage occurs only when the valve is moving, then 
for the purposes of prediction, we may produce an in- 
put sequence that represents a full valve cycle (e.g., 
that of Fig. 3) repeated indefinitely, and, using this, we 
may obtain EOL and RUL predictions in the number 
of valve cycles. 

5.1 Computationally Efficient Prediction 

The computational complexity of the prediction proce- 
dure presented as Algorithm 2 is linear in the number 
of particles, however, each particle may take a variable 
amount of time to simulate to EOL. Particles that pre- 
dict quickly progressing wear will complete quickly, 
while particles that predict slowly progressing wear 
will complete slowly, because many more simulation 
steps will be needed to reach EOL. This problem is 
exacerbated with models that require very small sam- 
pling periods. In fact, particles with very poor wear 
parameter estimates, i.e., close to 0, which correspond 
to very large EOL predictions, may take an exceed- 
ingly long time. Also, these particles may correspond 
to outliers, and, as such, contribute little to the predic- 
tion distribution. 

The only way to reduce the computational effort is 
to reduce the number of particles that are used in the 
prediction step. One approach is to randomly select 


an arbitrary number of particles from the original dis- 
tribution, but the statistics of the original distribution 
may not be preserved. A better approach is to sample 
from the distribution in such a way that the important 
statistical information is preserved, and the EOL distri- 
bution computed from this limited sample set closely 
approximates the statistical properties of the EOL dis- 
tribution computed from the complete set of samples. 

The unscented transform solves this problem. It 
takes a random variable x € K”* , with mean x and 
covariance P xx , which is related to a second random 
variable y by some nonlinear function y = g(x), 
and computes the mean y and covariance P yy us- 
ing a (small) set of deterministically selected weighted 
samples, called sigma points (Julier and Uhlmann, 
1997). For the task of EOL prediction, x is simply 
the joint state-parameter distribution represented by 

{ (K P > e k P ) - «4 P }iLv § is the function that computes 
EOL (i.e., simulates a particle to EOL), and y is the 
EOL. The required mean x and covariance P xx may 
be computed from the particle distributions using the 
formulas for weighted mean and weighted covariance. 

The statistics of y are computed by selecting a set of 
weighted sigma points from x, where X , denotes the 
ith point and w, denotes its weight. The sigma points 
are always chosen such that the mean and covariance 
match those of the original distribution, x and P xx . 
Each sigma point is passed through g to obtain new 
sigma points y, i.e., 

3>i = g(ATi) 

with mean and covariance calculated as 

y = y, WiVi 

i 

P yy = Wi ( yi ~ “ y) T 


The underlying idea of the unscented transform is that 
it is easier to approximate the distribution x than to 
approximate the nonlinear function g. This is the 
idea behind the unscented Kalman filter, where the un- 
scented transform is exploited for nonlinear state es- 
timation (Julier and Uhlmann, 1997; 2004). At each 
step, the unscented transform is applied to the state 
estimate and is used for a single step prediction. In 
contrast, here, we apply the transform to the state- 
parameter distribution at given single time point kp, 
and use this for multi-step predictions to EOL. 

Several methods exist for selecting sigma points. In 
the following sections, we briefly review three com- 
mon unscented transforms, and compare their fidelity 
on an example EOL prediction problem. Detailed per- 
formance results will be presented in Section 6 for 
fault prognosis of the solenoid valve. 

Symmetric Unscented Transform 

In the symmetric unscented transform, 2 n x + 1 sigma 
points are selected symmetrically about the mean in 
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Symmetric Sigma Points (n = 1) Minimal Skew Simplex Sigma Points (wq = 0.5) Spherical Simplex Sigma Points (wq = 0.5) 
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Figure 4: Sigma point locations and weights for a two-dimensional random variable x with x = 0 and P xx = I. 
The gray dots represent the random samples, and the markers represent the sigma points, labeled with their 
weights. 


the following way (Julier and Uhlmann, 2004): 


The remaining weights are computed using 


Wi = 


K 

(n x + «) ’ 

2 (n x + re) ’ 


i = 0 

* = 2 n x 


{ x, i = 0 

x+ (V( n x- + K) Psx) ,i=l,...,n x 
X- ( y/{n x + K)P xx ) ,i = n x + 1, . . . ,2 n x 


where ^ \/ (n x + k)P xx j refers to the ith column of 

the matrix square root of (n x + k)P xx (e-g-, computed 
using the Cholesky decomposition). The number re is a 
free parameter that can be used to tune the higher order 
moments of the distribution. If x is assumed Gaussian, 
then selecting re = 3 — n x is recommended (Julier and 
Uhlmann, 1997). A smaller value of re will bring the 
sigma points closer together. Note that the sigma point 
weights do not directly represent probabilities, so are 
not restricted to the interval [0,1]. 


Minimal Skew Simplex Unscented Transform 

The symmetric unscented transform uses 2 n x + 1 
sigma points, however, it is possible to reduce the 
number of points to n x + 2, while still capturing the 
first two moments of the distribution, thus reducing 
the amount of computation. The minimal skew sigma 
points are such a set, and satisfy an additional con- 
straint in which the skew (third moment) is minimized, 
which reduces the average error for a symmetric distri- 
bution (Julier and Uhlmann, 2002). 

The minimal skew sigma points are selected in a 
constructive manner, first by choosing the set of points 
for n x = 1, and then increasing n x by one until the 
full dimension is reached. The procedure for selecting 
sigma points for dimension n x for a distribution with 
mean 0 and P xx = I, where I is the identity matrix, 
is as follows (Julier and Uhlmann, 2002). First, the 
weight of the 0th sigma point is selected freely as 


w 0 G [0,1]. 


i = 1,2 

i = 3, . . . ,n x + 1 

For the initial dimension size j = 1, where X\ refers 
to the zth sigma point for the y th dimensional space, 
the sigma points are initialized as 

Afj = 0 
X{ = — ? L= 

y/2Wi 

X 2 = 

\/2w\ 

Expanding up to higher dimensions j = 2 , ,n x , the 
higher-dimensional sigma points are recursively de- 
fined as 

i = 0 

i = 1, - - - ,i , 

i= j + 1 



' VJ" 1 ' 

ar 0 



0 

J 


*r J 



i 



^/2il 

3 


0 

1 



1 


< 

_\f 2w 3 




where Oy is a vector of j zeros. The points form a 
simplex (a generalization of the triangle to arbitrary 
dimensions) centered about the origin, with an addi- 
tional point located at the origin (X (l ). 

The sigma points may then be transformed to those 
for mean x and covariance P xx using 

X'i = x + \/P xx X i , 

where \/P xx is the matrix square root of P xx . The 
transformed sigma points y and its statistics are com- 
puted as in the basic unscented transform, using X 
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Spherical Simplex Unscented Transform 

The problem identified with the skew simplex set of 
sigma points is that the weights vary by a factor of 2 nj! 
and the point coordinates vary by a factor of 2" Y!,/2 , 
so with large values of n x , numerical problems may 
arise (Julier, 2003 ). The spherical simplex points still 
use only n x + 2 points, but overcome this issue, plac- 
ing the sigma points on a hypersphere centered at the 
origin, with the Oth sigma point located at the origin. 
These points are constructed in a similar fashion to the 
minimal skew sigma points as follows (Julier, 2003 ). 

First, the weight of the Oth sigma point is selected 
freely as 

w 0 e [o,i]. 

The remaining weights are computed using 

1 - w 0 


The sigma points for dimensional space j = 1 are ini- 
tialized again as 

*o = 0 

X\ = — 7 L= 

\J2w\ 

X 1 = 1 

2 y/2 W 


Expanding up to higher dimensions j = 2 ,n x , the 
higher-dimensional sigma points are recursively de- 
fined as 


X] = 


vi~ !' 

0 


i 

y/j(j+l)w l. 

0, 

J : 

_y/j(j+l)w i_ 


i = 0 

* = !,•••) j , 
i = j + 1 


where 0 j is a vector of j zeros. These points may then 
be transformed for mean x and covariance P xx as be- 
fore. 

Fig. 4 compares the location and weights for a two- 
dimensional random variable for the three different 
transforms. The mean of the random variable is 0, and 
the covariance is I. 


Improved Prediction Procedure 

The improved prediction procedure uses Algorithm 2, 
only instead of the inputs being the particles and their 
weights, the inputs become the sigma points and their 
weights computed from the particle distribution at time 
kp, with a suitable sigma point selection algorithm. 
Fig. 5 shows an example output of the prediction pro- 
cedure for spring damage, using the full particle distri- 
bution (N = 100). Each particle creates a predicted 
trajectory, and determines a single EOL prediction. 
These individual predictions then form the complete 
prediction distribution. 

The predicted EOLs based on simulating the sigma 
points for the different selection algorithms reviewed 




Figure 5: Predicted trajectories and EOL distribution 
for spring damage. 


EOL Probability Mass Function 
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Figure 6: EOL predictions based on sigma points. 


here are shown in Fig. 6. The free parameters were 
selected by hand in this particular example. The pre- 
dicted EOL means and variances are shown in the fig- 
ure. Recall that the aim is to approximate the full 
state-parameter distribution using a small set of sam- 
ples, such that, when transformed to EOL, accurately 
predict the mean and variance of the EOL distribu- 
tion computed using the full distribution. An under- 
or overapproximation of either statistic is undesirable, 
as it misrepresents the EOL corresponding to the cur- 
rent belief state. For this example, in each case, the 
mean EOL predicted with the sigma points matches 
the mean from the full distribution within 0.2% error. 
The variances are less accurate, with around 6% er- 
ror for the symmetric and simplex sigma points, and 
around 16% error for the spherical sigma points. The 
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Table 1: Prognostics Performance Results for N = 500 and M = {x, i, open, closed} 


Fault 

PRMSE 

RSD ro 


RA 



RSD/////, 



T cpu 


Full 

Sym. 

Skew 

Sph. 

Full 

Sym. 

Skew 

Sph. 

Max 

Sym. 

Skew 

Sph. 

r 

3.72 

26.09 

94.39 

94.49 

93.90 

94.48 

19.89 

18.40 

21.35 

18.48 

72.98 

57.39 

53.91 

60.21 

k 

3.90 

14.74 

96.19 

96.20 

96.12 

96.17 

15.21 

14.27 

15.35 

14.42 

61.46 

58.05 

55.85 

58.06 

L c 

5.01 

18.01 

93.62 

93.85 

93.82 

93.77 

22.65 

20.18 

22.35 

21.63 

77.22 

63.32 

61.38 

61.41 

La 

3.05 

17.89 

95.48 

95.59 

95.12 

95.36 

16.95 

16.20 

19.30 

18.61 

67.93 

56.70 

52.69 

53.56 


error in predicted variance may be improved with bet- 
ter selection of the free parameters. An improvement 
in computation time of 18-24% was observed for the 
sigma point method. In this case n x = 5, so the sym- 
metric set has 11 sigma points and the simplex meth- 
ods have 7 points, so there is also a very significant 
improvement in memory requirements for prediction. 


6 RESULTS 


In this section, we evaluate the prognostics perfor- 
mance for the different prediction methods. In each 
case, we predict using the full particle distribution, the 
symmetric sigma points, the minimal skew simplex 
sigma points, and the spherical simplex sigma points, 
in order to compare the accuracy, precision, and com- 
putational cost of the prediction. 

Estimation accuracy is evaluated using percentage 
root mean square error (PRMSE), which expresses rel- 
ative estimation accuracy as a percentage: 


PRMSE = 100 


\ 


Mean/,, 

f Wk~w* k y 


[v K ) \ 


where w k denotes the estimated wear parameter value 
at time k, w^. denotes the true wear parameter value at 
k, and Mean/, denotes the mean over all values of k. 
Estimation spread is calculated using relative standard 
deviation (RSD), computed for the wear parameter dis- 
tribution at each prediction point (every 10 cycles), and 
averaged over all prediction points. The average is de- 
noted as RSD. In computing both PRMSE and RSD, 
we ignore the initial time period associated with esti- 
mation convergence. Convergence of the wear param- 
eter estimate, C w , is computed based on the definition 
of the convergence metric described in (Saxena et al., 
2008), where the convergence of a curve is expressed 
as the distance from the origin to the centroid under the 
curve (a shorter distance is better). We use the absolute 
error of the hidden parameter estimate as the curve. 

For a given prediction point kp, we compute mea- 
sures of accuracy and spread. For accuracy, we use the 
relative accuracy (RA) metric (Saxena et al., 2009): 


RA k „ = 100 1 - 


\RULl p -M^m{RULi p )\ 


RUL 


kp 


We calculate prediction spread using RSD, which we 
denote as RSD ///-/. Both RA and RSD are averaged 
over all prediction points starting from the prediction 
at which a prognostics horizon (RA within a speci- 
fied bound) is first reached (denoted using RA and 

RSDflc/i). 


In order to measure the computational performance, 
at each prediction point we measure the time taken 
for the prediction to be completed, t cpu {kp). For a 
given prediction method, we then compute the percent 
improvement over the time for the full distribution, 
t lpu l {kp), defined as 

rr _ (M - tcpu(kp) I 

J-cpu — -LUU .fulln x 

tcpu \kp ) 

This metric is then averaged over all kp, denoted as 
Tcpu, to summarize percent improvement over the en- 
tire experiment. We characterize the maximum possi- 
ble performance increase by computing T cpu for the 
prediction using a single point representing the mean 
of the state-parameter distribution. This performance 
can be achieved with the sigma point method by se- 
lecting a small enough value of n or wo such that all 
the sigma points are concentrated on the mean, how- 
ever, this would result in a vast underapproximation to 
the variance. 

We consider the case where only a single damage 
mode is actively progressing. Table 1 shows the per- 
formance for each fault for N = 500, and taking 
the complete measurement set M. The random walk 
variances were chosen as fixed values assuming that 
the orders of magnitude of the wear parameters were 
known. Overall, the unknown wear parameter can be 
estimated well. The desired outcome is that the com- 
puted RA and RSD /,.// L using the sigma point meth- 
ods closely approximate those produced using the full 
distribution. In the case of RA, the sigma point meth- 
ods are within 0.5% of the RA calculated using the 
full distribution, meaning that the means of the dis- 
tribution (from which RA is calculated) are predicted 
well. Larger differences are observed when compar- 
ing RSD/////,, as in most cases the sigma point meth- 
ods underapproximate the variance, with a worst-case 
error of 6-14%. The accuracy of variance prediction 
depends on the selected values of the free parameters 
of the unscented transforms, as correctly selected val- 
ues will lead to better approximations. In this case, 
we selected the suggested value of k for the symmet- 
ric sigma points, and this seemed to work well in all 
cases. For the minimal skew simplex sigma points, we 
chose wq = — 1 for r, L c , and L 0 , and Wo = 0.1 for 
k. For the spherical simplex sigma points, we chose 
Wq = 0.1 for k, L c , and L a , and Wq = —1 for r. 
These values were selected manually. 

Over 50% improvement in computation time was 
observed in all cases, coming within 75-90% of the 
maximum possible improvement. Further, only a frac- 
tion of the samples are used in the sigma point meth- 


9 



Annual Conference of the Prognostics and Health Management Society, 2010 


Table 2: Damage Estimation Performance for Spring 
Damage 
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16.44 

38.13 
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5.16 

15.74 
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W 
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4.61 

16.45 

34.90 
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500 

3.00 

21.89 

42.35 

-1 

1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 

0.6 

0.8 


W 0 


ods, saving significantly on memory. At the selected 
prediction points, the valve is in a closed state, and the 
effective n x is only 3 (i.e., only 3 of the states have 
different values between particles). Therefore, for the 
symmetric unscented transform only 2 n x + 1 = 7 
sigma points are required, and only n x + 2 = 5 are 
required for the minimal skew simplex and spherical 
simplex sigma points. For N = 500 this is an im- 
provement of over 98% in memory usage. 

To explore further, we focus on the case of spring 
damage, and vary the number of particles and the mea- 
surement set. Table 2 shows the estimation results. 
Overall, the unknown wear parameter can be estimated 
well with both N = 100 and N = 500. This is also 
true when the measurement set is varied, in fact, us- 
ing only the open and closed indicators, prognostics 
can still be performed, but at the cost of a wider vari- 
ance in the prediction and slower convergence. With 
more particles, PRMSE improves and RSD generally 
increases slightly. Convergence is somewhat better 
with fewer particles as the filter tends to be more ag- 
gressive, whereas additional particles smooth the be- 
havior. Of course, with too few particles, convergence 
may not occur, therefore a reasonably large N must 
usually be chosen. 

Table 3 shows the prediction performance. RA is 
estimated within similar error bounds as in Table 1. 
Again, the sigma point methods usually underapproxi- 
mate RSD «( /,. With fewer particles, the gain in com- 
putational efficiency is smaller, as expected, but gains 
of 20-40% are still observed with N = 100, coming 
within 30-90% of the maximum possible increase, and 
the memory usage improves by at least 93% (i.e., 100 
particles compared to at most 7 sigma points). Notice 
also that for the cases where RS D /,■(/ / is larger, such 
as with M = {open, closed,}, the savings are even 
greater, i.e., the wider the full particle distribution, the 
more of a savings the sigma point methods can offer. 

Overall, these results demonstrate that prediction 
can be achieved much more efficiently with limited 
deviations in prediction performance. The symmetric 
sigma points seemed to provide the largest improve- 
ment in time efficiency, but underapproximated the 
variance the most. These two effects are interrelated. 
The smallest values of the wear parameter in the full 
distribution contribute most to the time cost, so sigma 


Figure 7: Prognostics performance for different values 
of wo for the minimal skew simplex sigma points, for 
k with N = 500 and M = {x, i, open, closed}. 


points concentrated more towards the mean take less 
time to simulate. The minimal skew simplex sigma 
points came closest to the variance of the true distribu- 
tion, but usually with the smallest time improvement. 

The biggest practical difficulty in applying this 
method is the selection of the free parameters, as this 
relates to performance. Using the symmetric sigma 
points with the suggested value of k seemed to al- 
ways work well, requiring no further tuning. There 
is no heuristic available in the literature for the mini- 
mal skew simplex and spherical simplex sigma points. 
In order to examine the sensitivity of the selected 
value of the free parameter on performance, we varied 
the value of wo for the minimal skew simplex sigma 
points for the case of spring damage with N = 500 
and M = {x,i, open, closed}. According to Ta- 
ble 1, the full distribution achieves RA = 96.19% 
and RSD/jj/l = 15.21 cycles. Fig. 7 illustrates how 
these metrics vary over the selected range of wq. In 
general, a large value of w o will spread out the sigma 
points, and therefore increase the predicted variance. 
For w o € [—1-0, 0.5], RA and RSD have little devia- 
tion from the desired values produced by the full dis- 
tribution, so any value within this range will result in 
an acceptable approximation. But, for 'Wq £ (0.5, 0.9], 
the approximated RSD begins to increase significantly, 
and RA decreases with it at a much smaller rate. 

7 CONCLUSIONS 

In this paper, we developed a computationally efficient 
prediction scheme for model-based prognostics based 
on the unscented transform. The unscented transform 
allows the statistics of a distribution passed through a 
nonlinear transformation to be predicted using a min- 
imal set of deterministically selected samples. Ap- 
plying this to the prognostics problem, we are able to 
predict the mean and variance of the EOL accurately, 
and with improved computational efficiency and sig- 
nificantly reduced memory costs. 

Particle filtering approaches have become a popular 
choice for model-based prognostics (e.g., (Saha and 
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Table 3: Comparison of Prognostics Prediction Methods for Spring Damage 
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Sph. 
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14.96 
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55.85 

58.06 
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45.49 
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{x,i} 
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97.16 

97.18 
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14.48 

62.39 
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56.40 

58.81 
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100 
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93.65 
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16.58 

17.83 

16.79 

50.89 

42.83 

27.79 

31.37 
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94.85 

95.02 
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18.25 

16.68 

18.13 

16.93 

67.39 

62.92 

60.81 

63.09 

W 

100 

93.34 

93.34 

93.13 

93.31 

16.65 

15.51 

17.23 

16.22 

47.01 

37.62 

22.19 

25.65 

W 

500 

94.61 

94.77 

94.49 

94.77 

18.32 

16.83 

18.34 

17.14 

65.25 

60.80 

57.84 

60.52 

{open, closed} 

100 

94.46 

94.74 

94.28 

94.65 

22.84 

20.13 

22.83 

21.48 

48.95 

38.53 

16.73 

36.78 

{open, closed} 

500 

94.20 

94.58 

94.14 

94.40 

23.62 

20.50 

23.25 

21.55 

74.61 

69.16 

66.20 

68.29 


Goebel, 2009; Abbas et al., 2007)). The most sig- 
nificant disadvantage is the computational complex- 
ity, as usually a large number of particles are needed 
for accurate estimation, and, subsequently, prediction. 
A related approach to efficient prediction is described 
in (Orchard et al., 2008), however, in this approach, 
random sampling with a smaller number of particles is 
advocated. As described in Section 5, a large number 
of randomly selected particles are needed to correctly 
approximate the statistics of the prediction based on 
the full particle set, so a significant number of parti- 
cles would still be needed. However, the method de- 
scribed in this paper selects only the minimal number 
of points necessary to capture those statistics. This 
number is dependent only on the dimension of the state 
space. This approach is also applicable when a tech- 
nique other than particle filters is used for the estima- 
tion task, as long as the method provides a state distri- 
bution which is to be propagated forward to EOL. Note 
that if the unscented Kalman filter is used, the sigma 
points are already available for prediction. 

The unscented transform is not limited to Gaussian 
distributions, but the prediction method based on it is 
useful only when the mean and variance of the EOL 
distribution are meaningful statistics. For example, for 
a multi-modal distribution, a single mean and variance 
are not meaningful. This could be the case when mul- 
tiple future input trajectories are considered. In this 
case, each mode is associated with one of these trajec- 
tories, and each may be defined by a mean and vari- 
ance. Therefore, the method could be applied to each 
case individually to obtain the means and variances of 
the different modes. 

As part of future work, it is important to determine 
strategies for selecting the free parameters of the dif- 
ferent unscented transforms, as this is the main hurdle 
to practical implementation. As discussed in Section 6, 
the suggested heuristic for selecting k for the symmet- 
ric sigma points worked well. For the remaining meth- 
ods, selection of w o may be difficult, although a value 
between —1 and 0.1 worked well here. Further, scaling 
of the sigma points can also be performed, which intro- 
duces additional free parameters (Julier, 2002). Bring- 
ing the sigma points closer together will speed up com- 
putation, but it should be ensured that the variance es- 
timate remains accurate. A detailed analysis over this 


parameter space is necessary to suggest useful heuris- 
tics in the context of prognostics. One may then envi- 
sion automatic methods to tune the free parameters to 
achieve the desired spread in sigma points to correctly 
approximate EOL mean and variance. Extensions of 
the unscented transform to prediction of higher-order 
moments such as skew and kurtosis may also be useful 
for prognostics. 
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